Windows R: https://cran.r-project.org/bin/windows/base/
Windows RStudio:https://rstudio.com/products/rstudio/download/
For Mac:
Open an internet browser and go to https://www.r-project.org
Click the “download R” link in the middle of the page under “Getting Started.”
Select a CRAN location (a mirror site) and click the corresponding link.
Click on the “Download R for (Mac) OS X” link at the top of the page.
Click on the file containing the latest version of R under “Files.”
Save the .pkg file, double-click it to open, and follow the installation instructions.
Now that R is installed, you need to download and install RStudio
Go to www.rstudio.com and click on the “Download RStudio” button.
Click on “Download RStudio Desktop.”
Click on the version recommended for your system, or the latest Mac version, save the .dmg file on your computer, double-click it to open, and then drag and drop it to your applications folder.
install.packages("swirl")
library("swirl")
#swirl()
By the end of this workshop, you won’t become an expert in time series analysis and forecasting, but you will (hopefully) be:
All today’s slides, code, and rmarkdown files are available on GitHub
Downloading the workshop material from the terminal:
git clone https://github.com/makanig/solarHomeEnergyAnalytics.git
Time series analysis is commonly used in many fields of science, such as economics, finance, physics, engineering, and astronomy. The usage of time series analysis to understand past events and to predict future ones did not start with the introduction of the stochastic process during the past century. Ancient civilizations such as the Greeks, Romans, or Mayans, researched and learned how to utilize cycled events such as weather and astronomy to predict future events.
Time series analysis - is the art of extracting meaningful insights from time-series data to learn about past events and to predict future events.
This process includes the following steps:
A typical analysis in R process will look like this (picture credit:Rami & Danton):
There are multiple classes in R for time-series data, the most common types are:
ts class for regular time-series data, and mts class for multiple time seires objects , the most common class for time series dataxts and zoo classes for both regular and irregular time series data, mainly popular in the financial fieldtsibble class, a tidy format for time series data, support both regular and irregular time-series dataThe TSstudio package provides a set of functions for time series analysis. That includes interactive data visualization tools based on the plotly package engine, supporting multiple time series objects such as ts, xts, and zoo. The following diagram demonstrates the workflow of the TSstudio package:
Time series data - is a sequence of values, each associate to a unique point in time that can divide to the following two groups:
Note: typically, the term time series data referred to regular time-series data. Therefore, if not stated otherwise, throughout the workshop the term time series (or series) refer to regular time-series data
With time series analysis, you can answer questions such as:
This builds on what you saw in the previous Time Series Analysis workflow slide with model comparison and financial sensitivity analysis
## Start of program
## Check raw data - exploratory analysis
setwd(dataDir)
# get generation
solarDf = tbl_df(read.csv(solarProductionDaily, stringsAsFactors = F))
colnames(solarDf) <- c("DATE", "energyProducedWh")
# skip the total row
solarDf = solarDf %>% mutate(DATE = as.Date(DATE,"%Y-%m-%d")) %>% na.omit()
solarDfTsibble = solarDf %>% select(DATE, energyProducedWh) %>% as_tsibble()
ts_plot(solarDfTsibble,
title = "Daily Solar Generation",
Ytitle = "Wh Daily Production",
# Xtitle = "Date",
slider = TRUE)
# Look at the fixed data
setwd(dataDir)
solarDf = getSolarDf()
solarDfTsibble = solarDf %>% select(DATE, energyProducedWh) %>% as_tsibble()
ts_plot(solarDfTsibble,
title = "Daily Solar Generation",
Ytitle = "Wh Daily Production",
# Xtitle = "Date",
slider = TRUE)
# Get the daily temperature and precipitation
weatherDf = tbl_df(read.csv(weatherData, stringsAsFactors = F)) %>%
select( DATE, TMAX, TMIN, PRCP)
weatherDf$DATE = as.Date(weatherDf$DATE,"%Y-%m-%d")
weatherDf$TEMP = (weatherDf$TMAX + weatherDf$TMIN)/2
# break up each of the attributes with gather
weatherDfGather = weatherDf %>% gather("attribute", "value", -DATE)
weatherDfGatherTs = nest_AddMissingRows_xts(weatherDfGather, "attribute", "DATE", "value")
weatherDfDaily = weatherDfGatherTs %>% spread(key = attribute, value=data.tsClean) %>% unnest() %>%
mutate(DATE=tk_index(weatherDfGatherTs$data.tsClean[[1]] , timetk_idx = TRUE))
# include daily weather to Solar dataframe
solarDf = solarDf %>% inner_join(weatherDfDaily) %>% rowwise() %>%
mutate(DayLength = daylength(myLatitude, DATE))
# generate time series and plot seasonal
solarTs = ts(solarDf$energyProducedWh, start=c(solarDf$year[[1]], solarDf$yDay[[1]]), frequency = 365)
ts_seasonal(solarTs)
# examine the plot below for stationarity and order
setwd(dataDir)
solarTsAdj = solarTs %>% stl(s.window='periodic')%>% seasadj()
autoplot(solarTsAdj) # shows non-stationary, so use diff()
solarTsAdj %>% diff() %>% ggtsdisplay(main="")
# do the same for the log transformed, bounded values
# To forecast with ARIMA, first look at the profile of entire series, the ts decompose to seasonal shows tiny weekly
# fluctuations, look for a better model
solarTsLog = ts(solarDf$energyProducedWhLog, start=c(solarDf$year[[1]], solarDf$yDay[[1]]), frequency = 365)
# comp = decompose(solarTsLog)
# plot(comp)
# # the seasonal pattern above is not too good, look for a better one
# examine the log plot below for stationarity and order
solarTsLogAdj = solarTsLog %>% stl(s.window='periodic')%>% seasadj()
autoplot(solarTsLogAdj) # shows non-stationary, so use diff()
solarTsLogAdj %>% diff() %>% ggtsdisplay(main="")
setwd(dataDir)
nReadings = nrow(solarDf)
# Use this to fit a sinusoidal pattern
solarDf$xc<-cos(2*pi*solarDf$DATE_Numeric/365)
solarDf$xs<-sin(2*pi*solarDf$DATE_Numeric/365)
# Split into training and test
halfWay = floor(nReadings/2)
trainingDf = as_tibble(solarDf[1:halfWay,]) %>% na.omit()
testDf = as_tibble(solarDf[halfWay+1:nReadings,]) %>% na.omit()
trainingLen = halfWay
testLen = nReadings - (halfWay+1) +1
setwd(dataDir)
decompose_df <- glm(energyProducedWhLog ~ xc+PRCP +DATE_Numeric*xc +
TMAX + TMIN, data=solarDf)
summary(decompose_df)
##
## Call:
## glm(formula = energyProducedWhLog ~ xc + PRCP + DATE_Numeric *
## xc + TMAX + TMIN, data = solarDf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.57045 -0.19717 0.09251 0.27796 1.12271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.010e+00 5.320e-01 7.538 8.28e-14 ***
## xc -3.330e+00 6.546e-01 -5.087 4.10e-07 ***
## PRCP -7.399e-01 6.296e-02 -11.752 < 2e-16 ***
## DATE_Numeric -2.911e-04 2.776e-05 -10.487 < 2e-16 ***
## TMAX 9.930e-03 1.916e-03 5.182 2.50e-07 ***
## TMIN -8.066e-03 2.586e-03 -3.119 0.00185 **
## xc:DATE_Numeric 8.333e-05 3.759e-05 2.217 0.02677 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1948883)
##
## Null deviance: 3122.33 on 1486 degrees of freedom
## Residual deviance: 288.43 on 1480 degrees of freedom
## AIC: 1797.2
##
## Number of Fisher Scoring iterations: 2
trend <- coef(decompose_df)[1] + coef(decompose_df)['DATE_Numeric']*solarDf$DATE_Numeric
components <- cbind(
data = solarTsLog,
fitted = decompose_df$fitted.values,
trend = trend,
season = coef(decompose_df)['xc']*solarDf$xc +
coef(decompose_df)['xc:DATE_Numeric']*solarDf$xc*solarDf$DATE_Numeric,
prcp = coef(decompose_df)['PRCP']*solarDf$PRCP,
TMAX = coef(decompose_df)['TMAX']*solarDf$TMAX,
TMIN = coef(decompose_df)['TMIN']*solarDf$TMIN,
remainder = residuals(decompose_df)
)
p = (autoplot(components, facet=TRUE) + ggtitle("GLM model components"))
p %>% ggplotly()
RSS <- c(crossprod(decompose_df$residuals))
#Mean squared error:
MSE <- RSS / length(decompose_df$residuals)
#Root MSE:
RMSE <- sqrt(MSE)
#Pearson estimated residual variance (as returned by summary.lm):
sig2 <- RSS / decompose_df$df.residual
gof = 1 - (decompose_df$deviance/decompose_df$null.deviance)
print(paste("RMSE:", RMSE, " Pearson residual variance", sig2, " GOF:", gof))
## [1] "RMSE: 0.440421251694386 Pearson residual variance 0.194888308776893 GOF: 0.907621987482646"
setwd(dataDir)
tsStationary = diff(solarTsLog,differences=1)
# Forecast with ARIMA, use the training set for the fit
solarDfTrainingTsLog = ts(data=trainingDf$energyProducedWhLog, start=c(trainingDf$year[[1]], trainingDf$yDay[[1]]),
frequency=365)
if (useRDSFiles) {
ar_211_010 = readRDS("arModel.rds")
} else {
ar_211_010 = Arima(solarDfTrainingTsLog, order = c(2,1,1),seasonal = c(0,1,0))
saveRDS(ar_211_010, "arModel.rds")
}
summary(ar_211_010)
## Series: solarDfTrainingTsLog
## ARIMA(2,1,1)(0,1,0)[365]
##
## Coefficients:
## ar1 ar2 ma1
## 0.3535 0.1000 -1.0000
## s.e. 0.0513 0.0522 0.0091
##
## sigma^2 estimated as 0.411: log likelihood=-371.73
## AIC=751.47 AICc=751.57 BIC=767.2
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01495805 0.4548676 0.2319034 18.38288 50.35202 0.4301588
## ACF1
## Training set 0.008524465
plot(x=ar_211_010$x, y = ar_211_010$fitted)
checkresiduals(ar_211_010)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,1)(0,1,0)[365]
## Q* = 226.45, df = 146, p-value = 2.22e-05
##
## Model df: 3. Total lags used: 149
# Ljung-Box test
#
# data: Residuals from ARIMA(2,1,1)(0,1,0)[365]
# Q* = 226.45, df = 146, p-value = 2.22e-05
#
# Model df: 3. Total lags used: 149
if (useRDSFiles) {
aF = readRDS("aF_Files.rds")
} else {
aF = forecast(ar_211_010,h = testLen)
saveRDS(aF, "aF_Files.rds")
}
plot(aF$mean)
#arimaForecastValues = as_tibble(aF$mean)
# GLM Weather independent
# https://stats.stackexchange.com/questions/47840/linear-model-with-log-transformed-response-vs-generalized-linear-model-with-log
weatherIndependentGlmModel = glm(energyProducedWhLog ~ DATE_Numeric + xc+ DATE_Numeric*xc + DATE_Numeric*xs + xs +
DATE_Numeric*dRank + xc*dRank + xs*dRank,
# family="gaussian"( link="log" ),
data = trainingDf)
summary(weatherIndependentGlmModel)
##
## Call:
## glm(formula = energyProducedWhLog ~ DATE_Numeric + xc + DATE_Numeric *
## xc + DATE_Numeric * xs + xs + DATE_Numeric * dRank + xc *
## dRank + xs * dRank, data = trainingDf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.80681 -0.13427 0.07929 0.30124 0.89137
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.510e+01 2.292e+01 1.095 0.273812
## DATE_Numeric -1.520e-03 1.346e-03 -1.129 0.259109
## xc -9.200e+00 1.875e+01 -0.491 0.623749
## xs -1.862e+00 2.377e+00 -0.783 0.433651
## dRank -3.129e+01 4.592e+01 -0.682 0.495762
## DATE_Numeric:xc 4.231e-04 1.101e-03 0.384 0.700971
## DATE_Numeric:xs 1.230e-04 1.401e-04 0.878 0.380491
## DATE_Numeric:dRank 1.846e-03 2.698e-03 0.684 0.494083
## xc:dRank 1.305e-01 1.219e-01 1.070 0.285029
## xs:dRank -5.493e-01 1.546e-01 -3.554 0.000404 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.2451588)
##
## Null deviance: 1648.7 on 742 degrees of freedom
## Residual deviance: 179.7 on 733 degrees of freedom
## AIC: 1075.9
##
## Number of Fisher Scoring iterations: 2
weatherIndependentGlmForecast <- stats::predict(weatherIndependentGlmModel, type = "response", newdata=testDf) %>%
tibble::enframe()
weatherIndependentErr <- stats::predict(weatherIndependentGlmModel, newdata=testDf, se = TRUE)
weatherBasedGlmModel = glm(energyProducedWhLog ~ xc+PRCP +DATE_Numeric*xc +
TMAX + TMIN,
# energyProducedWhLog ~ DATE_Numeric + dRank + PRCP + xc + xc*dRank + DATE_Numeric*xc +
# TMAX + TMIN + DATE_Numeric*xs + DATE_Numeric*dRank + xs*dRank,
#family="gaussian"( link="log" ),
data = trainingDf)
summary(weatherBasedGlmModel)
##
## Call:
## glm(formula = energyProducedWhLog ~ xc + PRCP + DATE_Numeric *
## xc + TMAX + TMIN, data = trainingDf)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.54237 -0.18897 0.08387 0.26581 0.99582
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.373e+00 1.494e+00 5.603 2.98e-08 ***
## xc -2.126e+00 1.916e+00 -1.110 0.267
## PRCP -8.540e-01 9.416e-02 -9.069 < 2e-16 ***
## DATE_Numeric -5.390e-04 8.407e-05 -6.412 2.57e-10 ***
## TMAX 1.408e-02 2.805e-03 5.021 6.47e-07 ***
## TMIN -1.608e-02 3.932e-03 -4.090 4.79e-05 ***
## xc:DATE_Numeric 1.175e-05 1.125e-04 0.104 0.917
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.2044866)
##
## Null deviance: 1648.7 on 742 degrees of freedom
## Residual deviance: 150.5 on 736 degrees of freedom
## AIC: 938.18
##
## Number of Fisher Scoring iterations: 2
weatherBasedGlmForecast <- stats::predict(weatherBasedGlmModel, type = "response", newdata=testDf) %>%
tibble::enframe()
weatherBasedErr <- stats::predict(weatherBasedGlmModel, newdata=testDf, se = TRUE)
allTestVals = tibble(DATE = testDf$DATE,
ArimaVals= backTransformLogValues(aF$mean),
WeatherIndependent =backTransformLogValues(weatherIndependentGlmForecast$value),
WeatherBasedSeasonal =backTransformLogValues(weatherBasedGlmForecast$value ))
dfAllTestVals = solarDf %>% dplyr::left_join(allTestVals, by = "DATE")
r = dfAllTestVals %>%
select(DATE, energyProducedWh, ArimaVals, WeatherIndependent, WeatherBasedSeasonal) %>%
mutate(Actuals=energyProducedWh) %>% select(-energyProducedWh) %>%
melt(id.vars = c("DATE"), variable.name="Model")
g <- ggplot(data = r,
aes(x=DATE, y=value)) + ggtitle("Solar generation - Model comparison") +
xlab(NULL) + ylab("Watt hours Daily Production") +
# geom_line(aes(colour=variable, group=variable)) +
geom_point(aes(colour=Model, shape=Model, group=Model), size=1)
g %>% ggplotly()
dfAllTestDiffs = dfAllTestVals %>% mutate(ArimaValsDiff = energyProducedWh - ArimaVals,
WeatherIndependentDiff = energyProducedWh - WeatherIndependent,
WeatherBasedSeasonalDiff = energyProducedWh - WeatherBasedSeasonal)
rDiff = dfAllTestDiffs %>%
select(DATE, ArimaValsDiff, WeatherIndependentDiff, WeatherBasedSeasonalDiff) %>%
melt(id.vars = c("DATE"), variable.name="Model")
g <- ggplot(data = rDiff,
aes(x=DATE, y=value)) + ggtitle("Solar generation - Model difference comparison") +
xlab(NULL) + ylab("Actual vs estimated daily Watt hrs") +
# geom_line(aes(colour=variable, group=variable)) +
geom_point(aes(colour=Model, shape=Model, group=Model), size=1)
g %>% ggplotly()
modelSummary = tibble(modelMSPE = c("ArimaMSPE","WeatherIndependentGLM_MSPE", "WeatherBasedSeasonalGLM_MSPE"),
modelMSPEVals = c(mean(dfAllTestDiffs$ArimaValsDiff ^ 2, na.rm =T),
mean(dfAllTestDiffs$WeatherIndependentDiff ^ 2, na.rm=T),
mean(dfAllTestDiffs$WeatherBasedSeasonalDiff ^ 2, na.rm=T))) %>%
arrange(modelMSPEVals)
# print out the final model summary
modelSummary
## # A tibble: 3 x 2
## modelMSPE modelMSPEVals
## <chr> <dbl>
## 1 WeatherBasedSeasonalGLM_MSPE 3896533.
## 2 WeatherIndependentGLM_MSPE 4344898.
## 3 ArimaMSPE 6356574.
setwd(dataDir)
######
##solarTsLog is the transformed time series
# redo the ARIMA model with the complete set of generated values
if (useRDSFiles) {
ar_FullDfModel = readRDS("ar_FullDfModel.rds")
} else {
ar_FullDfModel = Arima(solarTsLog, order = c(2,1,1),seasonal = c(0,1,0))
saveRDS(ar_FullDfModel, "ar_FullDfModel.rds")
}
year_15_EndDate = solarInstallDate %m+% years(15)
year_20_EndDate = solarInstallDate %m+% years(20)
year_25_EndDate = solarInstallDate %m+% years(25)
# forecast once upto the longest period and take the subset of gen values
fLen = as.integer(year_25_EndDate - as.Date(last(solarDf$DATE)))
fLen15 = as.integer(year_15_EndDate - as.Date(last(solarDf$DATE)))
fLen20 = as.integer(year_20_EndDate - as.Date(last(solarDf$DATE)))
if (useRDSFiles) {
ar_fullForecast = readRDS("ar_fullForecast.rds")
} else {
ar_fullForecast = forecast(ar_FullDfModel, h = fLen)
saveRDS(ar_fullForecast, "ar_fullForecast.rds")
}
## aF = forecast(ar_211_010,h = testLen)
#saveRDS(aF, "aF_Files.rds")
# getting dates back from timeseries is tricky
myForecastTimes = as.vector(time(ar_fullForecast$mean))
myForecastedDates = as.Date(format(date_decimal(myForecastTimes), "%Y-%m-%d"))
myActualTimes = as.vector(time(solarTs))
myActualDates = as.Date(format(date_decimal(myActualTimes), "%Y-%m-%d"))
forecastDf = tibble(DATE=myForecastedDates,
ArimaPredictedDailyWh= as.numeric(backTransformLogValues(ar_fullForecast$mean)))
actualDf = tibble(DATE = myActualDates,
ActualDailyGenerationWh = as.numeric( as.numeric(solarTs)))
fDif1 = forecastDf %>% melt(id.vars = c("DATE"), variable.name="Attribute")
actualDf1 = actualDf %>% melt(id.vars = c("DATE"), variable.name="Attribute")
fullGenDf = rbind(actualDf1,fDif1) # %>% mutate(DATE=format(date_decimal(DATE),"%Y-%m-%d"))
# x axis gets too busy, translate after plotting
g <- ggplot(data = fullGenDf,
aes(x=DATE, y=value)) + ggtitle("Solar generation ARIMA forecast") +
xlab(NULL) + ylab("Actual and forecasted daily Watt hrs") +
# geom_line(aes(colour=variable, group=variable)) +
geom_point(aes(colour=Attribute, shape=Attribute, group=Attribute), size=1)
g %>% ggplotly()
setwd(dataDir)
# Financial and TOU calculations
fullGenDf = fullGenDf %>% mutate(month = month(DATE),
year = year(DATE))
fullGenDf$DATE = as.Date(fullGenDf$DATE)
fullGenMonthlyDf = fullGenDf %>% group_by(year, month) %>% summarize(monthlyGenKwh=sum(value)/1000) %>%
mutate(DATE=make_date(year, month, 1)) %>% ungroup()
# Model 3 scenarios, two increasing electricity prices, another flat
electricityCostDf1 = as_tibble(list(DATE=c(as.Date("2018-07-01"),as.Date("2019-07-01")),
costPerKwhPeak = c(0.371, 0.382),
costPerKwhPartPeak = c(0.256, 0.267),
costPerKwhOffPeak = c(0.18, 0.19),
costProfile = "CostProfile1",
model = "linear"))
electricityCostDf2 = as_tibble(list(DATE=c(as.Date("2015-07-01"), as.Date("2019-07-01"), as.Date("2025-07-01")),
costPerKwhPeak = c(0.338, 0.382, 0.382),
costPerKwhPartPeak = c(0.221, 0.267, 0.267),
costPerKwhOffPeak = c(0.16, 0.19, 0.19),
costProfile = "CostProfile2",
model = "linear"))
electricityCostDf3 = as_tibble(list(DATE=c(as.Date("2018-07-01"), as.Date("2019-07-01"), as.Date("2025-07-01")),
costPerKwhPeak = c(0.372, 0.382, 0.382*2),
costPerKwhPartPeak = c(0.256, 0.267, 0.267*2),
costPerKwhOffPeak = c(0.18, 0.19, 0.19*2),
costProfile = "CostProfile3",
model = "log"))
electricityCostDf4 = as_tibble(list(DATE=c(as.Date("2018-07-01"), as.Date("2019-07-01"),
as.Date("2022-07-01"), as.Date("2038-07-01")),
costPerKwhPeak = c(0.372, 0.382, 0.19, 0.19),
costPerKwhPartPeak = c(0.256, 0.267, 0.15, 0.15),
costPerKwhOffPeak = c(0.18, 0.19, 0.11, 0.11),
costProfile = "CostProfile4",
model = "linear"))
costProfileList = list(electricityCostDf1,
electricityCostDf2,
electricityCostDf3,
electricityCostDf4)
# costProfileList = c(nest(electricityCostDf1),
# nest(electricityCostDf2),
# nest(electricityCostDf3))
genPreds = map(costProfileList,predictMonthlyGen,genMonthlyDf = fullGenMonthlyDf)
#r = rbindlist(genPreds)
# %>% group_by(costProfile) %>% arrange(DATE) %>%
# mutate(cumMonthlySavings = cumsum(monthlySavings) - solarInvestment) %>% ungroup()
allScenarios = rbindlist(genPreds) %>% group_by(costProfile) %>% arrange(DATE) %>%
mutate(cumMonthlySavings = cumsum(monthlySavings) - solarInvestment,
realizedPricePerKwh = monthlySavings/monthlyGenKwh) %>% ungroup()
g1 <- ggplot(data = allScenarios,aes(x=DATE, y=cumMonthlySavings)) +
ggtitle("Solar investment scenarios") +
xlab(NULL) + ylab("Cumulative net savings") +
# geom_line(aes(colour=variable, group=variable)) +
geom_line(aes(colour=costProfile),
size=1) + theme(legend.position="none") #+ geom_text()
# geom_dl(aes(label=costProfile), method="last.points")
# geom_line(aes(y=solarInvestment, color='Investment'))
g2 <- ggplot(data = allScenarios,aes(x=DATE, y=realizedPricePerKwh)) +
ggtitle("Solar investment scenarios") +
xlab(NULL) + ylab("Generated Kwh price") + # theme(legend.position="none") +
# geom_line(aes(colour=variable, group=variable)) +
geom_line(aes(colour=costProfile),
size=1)
# geom_dl(aes(label=costProfile), method="last.points")
#multiplot(g1, g2, cols=1) %>% ggplotly()
#multiplot(g1, g2, cols = 1) %>% ggplotly()
allScenariosPayback = allScenarios %>% group_by(costProfile) %>%
arrange(DATE) %>% filter(cumMonthlySavings > 0) %>%
filter(cumMonthlySavings == min(cumMonthlySavings)) %>%
mutate(paybackPeriodDays = DATE - solarInstallDate,
paybackYrs = as.numeric(paybackPeriodDays)/365) %>%
select(costProfile, paybackYrs, DATE)
allScenariosReturnROI = allScenarios %>% group_by(costProfile) %>%
arrange(DATE) %>% summarise(IRR = irr(c(-solarInvestment,
monthlySavings))*12*100,
NetReturn=last(cumMonthlySavings),
ROI=NetReturn/solarInvestment*100)
mySummary = allScenariosPayback %>% left_join(allScenariosReturnROI)
# round off lengthy decimal points
is.num <- sapply(mySummary, is.numeric)
mySummary[is.num] <- lapply(mySummary[is.num], round, 2)
mySummary = mySummary %>% arrange(-ROI) %>% mutate(NetReturn = sprintf("$%.0f",NetReturn),
ROI = sprintf("%.0f %%", ROI),
IRR = sprintf("%.0f %%", IRR))
sumPlot <- plot_ly(
type = 'table',
header = list(
values = c(colnames(mySummary)),
align = c('left', rep('center', ncol(mtcars))),
line = list(width = 1, color = 'black'),
fill = list(color = '#daf2ed'),
font = list(family = "Arial", size = 14, color = "black")
),
cells = list(
values = rbind(
# rownames(mtcars),
t(as.matrix(unname(mySummary)))
),
align = c('left', rep('center', ncol(mtcars))),
line = list(color = "black", width = 1),
fill = list(color = c('rgb(235, 193, 238)', 'rgba(228, 222, 249, 0.65)')),
font = list(family = "Arial", size = 12, color = c("black"))
))
subplot(g1,g2, nrows=2, titleY = TRUE)
sumPlot
setwd(dataDir)
# Billing data analysis
pgeElectricBill = tbl_df(read.csv(pgeElectricBilling, stringsAsFactors = F,
skip = electricSkipLines))
pgeElectricBill$START.DATE = as.Date(pgeElectricBill$START.DATE,"%Y-%m-%d")
pgeElectricBill$END.DATE = as.Date(pgeElectricBill$END.DATE,"%Y-%m-%d")
pgeElectricBill$COST = as.numeric(sub("\\$","", pgeElectricBill$COST))
pgeElectricBill$numDays =as.integer(pgeElectricBill$END.DATE -
pgeElectricBill$START.DATE)
pgeElectricBill$avgDailyUsage = pgeElectricBill$USAGE/pgeElectricBill$numDays
pgeElectricBill$KwhCost = pgeElectricBill$COST/pgeElectricBill$USAGE
pgeElectricBill = pgeElectricBill %>% select(-NOTES) %>% mutate(year = year(START.DATE),
month = month(START.DATE))
g1 = ggplot(pgeElectricBill) + aes(x = START.DATE) + geom_line(aes(y = USAGE)) +
scale_colour_manual(values=c("red", "green")) +
xlab(NULL) + ylab("Monthly Usage (Kwh)") + #+
ggtitle("Monthly Electric usage")
g2 = ggplot(pgeElectricBill) + aes(x = START.DATE) + geom_line(aes(y = COST)) +
xlab(NULL) + ylab("Monthly Cost($)") + #+
scale_colour_manual(values=c("red", "green")) +
ggtitle("Monthly Electric usage and $$ spend")
subplot(g1,g2,nrows=2, titleY = TRUE)
lastSolarReading = last(solarDf$DATE)
l = nrow(pgeElectricBill)
s = seq(as.Date(pgeElectricBill$START.DATE[1]), as.Date(pgeElectricBill$END.DATE[l]), "days") %>%
tibble::enframe(name=NULL)
colnames(s) = c("DATE")
s = s %>% mutate(year=year(DATE),month=month(DATE)) %>% filter(DATE < lastSolarReading)
s2 = s %>% full_join(pgeElectricBill, by=c("year","month")) %>% mutate(usage=avgDailyUsage) %>%
select(DATE,usage,KwhCost) %>% filter(!is.na(usage))
#KwhCost not useful because of extreme range of values
sDf = getSolarDf()
# merge with solarDf, the ts seems to have missing dates
s3 = s2 %>% left_join(sDf, by="DATE") %>% mutate(genKwh = energyProducedWh/1000,
netUsage = usage) %>% select(DATE, netUsage, genKwh)
s3$genKwh[s3$DATE < solarInstallDate] = 0
#s2[is.na(s2)] <- 0
s3 = s3 %>% mutate(usage=netUsage+genKwh)
# get monthly values, remove partial months
s3 <- s3 %>%
mutate(month = month(DATE),
year = year(DATE),
DATE = floor_date(DATE, "month")) %>%
group_by(year, month, DATE) %>%
add_count() %>%
summarise(monthlyUsage = sum(usage),
numDays = n(),
monthlyNetUsage = sum(netUsage),
monthlyGen = sum(genKwh)) %>% ungroup() %>%
filter(numDays >= 25) %>%
select(-numDays)
s3$context = "No Solar"
s3$context[s3$DATE > solarInstallDate] = "Solar"
s3$context[s3$DATE > electricCarDeployDate] = "Solar & Electric Car"
g1 = ggplot(s3) + aes(x = DATE) + geom_line(aes(y = monthlyUsage, colour = context)) +
xlab(NULL) + ylab("Monthly Usage (Kwh)") + #+
ggtitle("Monthly Electric usage & generation")
g2 = ggplot(s3) + aes(x = DATE) + geom_line(aes(y = monthlyGen, colour = context)) +
xlab(NULL) + ylab("Monthly Generation (Kwh)") + #+
ggtitle("Monthly Electric usage & generation")
g3 = ggplot(s3) + aes(x = DATE) + geom_line(aes(y = monthlyNetUsage, colour = context)) +
xlab(NULL) + ylab("NEM(Kwh)") + #+
ggtitle("Monthly Electric usage & generation")
subplot(g1,g2,g3,nrows=3, titleY = TRUE)